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Abstract 

The use of multicomponent images has become widespread with the improvement of multisensor 
systems having increased spatial and spectral resolutions. However, the observed images are often 
corrupted by an additive Gaussian noise. In this paper, we are interested in multichannel image denoising 
based on a multiscale representation of the images. A multivariate statistical approach is adopted to take 
into account both the spatial and the inter-component correlations existing between the different wavelet 
subbands. More precisely, we propose a new parametric nonlinear estimator which generalizes many 
reported denoising methods. The derivation of the optimal parameters is achieved by applying Stein's 
principle in the multivariate case. Experiments performed on multispectral remote sensing images clearly 
indicate that our method outperforms conventional wavelet denoising techniques. 
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I. Introduction 

Many real world images are contaminated by noise during their acquisition and/or transmission. In 
particular, multichannel imaging is prone to quality degradation due to the imperfectness of the sensors 
often operating in different spectral ranges [1], [2]. In order to alleviate the influence of such disturbing 
artifacts on subsequent analysis procedures, denoising appears as a crucial initial step in multicomponent 
image enhancement. In this context, attention has been paid to developing efficient denoising methods. 
Usually, the noise removal problem is considered as a regression problem. The challenge thus resides in 
finding realistic statistical models which lead to both efficient and tractable denoising approaches. To this 
respect, linearly transforming the signal from the spatial domain to a more suitable one may drastically 
improve the denoising performance. The rationale for such a transformation is the observation that 
some representations possessing good energy concentration and decorrelation properties tend to simplify 
the statistical analysis of many natural images. For instance, the Discrete Wavelet Transform (DWT) 
constitutes a powerful tool for image denoising [3], [4]. The DWT, computed for each channel/component 
separately, usually yields "larger" coefficients for signal features and "smaller" ones for noise since it 
forms an unconditional basis for several classes of regular signals [5]. For monochannel signals or images, 
the seminal work of Donoho and Johnstone has shown that a mere wavelet coefficient thresholding 
constitutes a simple yet effective technique for noise reduction [6]. Based on Stein's Unbiased Risk 
Estimator (SURE), they have proposed the SUREshrink technique [7]. Subsequently, several extensions 
of their work have been performed, e.g. in [8]-[ll]. Recently, the denoising problem in the wavelet 
domain has gained more attention in the case of multichannel images. Indeed, the increasing need for 
multicomponent images in several applications such as medical imaging and remote sensing has motivated 
a great interest in designing tractable denoising methods dedicated to this kind of images. Componentwise 
processing can be performed for each modality, but a joint denoising should be preferred in order to exploit 
the cross-channel similarities in an efficient way [12]. The problem of a joint estimation in the wavelet 
domain has been formulated in [13]. More precisely, the use of joint threshold estimators was investigated 
in two situations: overcomplete representations of a noisy image [14] and multiple observations of the 
same image [13]. A scale-adaptive wavelet thresholding was designed for multichannel images in the case 
of an i.i.d. (independent identically distributed) Gaussian vector noise whose components are independent 
and have the same variance [15]. In a Bayesian framework, several prior models have been considered such 
as multivariate Bernoulli-Gaussian ones [16]. A generalized Gaussian distribution was also considered for 
modelling the marginal distribution of each subband in each channel and a simple shrinkage was applied 
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depending on the local spectral activity [17]. A vector-based least-square approach was also investigated 
in the wavelet domain [18]. Recently, the application of Stein's principle [19]-[21] in the multivariate 
case has motivated the design of a nonlinear estimator in [22]. In this paper, links existing between 
the proposed nonlinear estimator and Bayesian approaches were discussed. In particular, the structure of 
the estimator was motivated by a multivariate Bemoulh-Gaussian model reflecting the sparseness of the 
wavelet representation as well as the statistical dependencies existing between the different components. 
We point out that the form of the estimator in [22] is not the same as the one proposed in this paper. 
In particular, the estimator in [22] does not involve any thresholding operation. Moreover, the estimator 
does not allow to take into account spatial dependencies but only those existing between the multichannel 
data at a given position. 

In parallel to these works, the idea of performing a joint spatial denoising of the coefficients, rather 
than using a conventional term-by-term processing, has emerged in statistics. This idea, stemming from 
an incentive for capturing statistical dependences between spatial neighboring wavelet coefficients, was 
first investigated for single component images in both non-Bayesian and Bayesian cases [23], [24]. A 
successful extension was also carried out in the case of multichannel images by considering hybrid 
(spectral and spatial) neighborhoods [25]. 

In this paper, we aim at building a new estimator allowing to take into account the various correlations 
existing in multichannel image data. This estimator also provides a unifying framework for several 
denoising methods proposed in the literature. More precisely, our contributions are the following. 

• The method applies to any vector-valued data embedded in a multivariate Gaussian noise. As illustrated 
later on, many examples of such multivariate contexts (inter-component, spatial and inter-scale) can be 
found. They naturally include multivariate denoising obtained with vectors of samples sharing the same 
spatial position in different channels. 

• The estimator can be computed in any image representation domain. For instance, in addition to wavelet 
domains, usually considered in conventional denoising methods, we propose to exploit more general frame 
decompositions such as the dual-tree wavelet transform [26], [27]. 

• The computation of the estimated value can be performed with the help of various observations. Again, 
our method includes most of the reported estimation methods acting in that way. Furthermore, it offers 
a great flexibility in the choice of these auxiliary data. 

• The form of the proposed estimator is quite general. More precisely, we focus on deriving thresholding 
estimators including an exponent parameter and a linear part. Optimal parameters are derived from Stein's 
principle. 
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• The denoising approach allows to handle any covariance matrix between the multichannel noise 
components. 

Notwithstanding its generality, the proposed approach remains tractable and compares quite favorably 
with state-of-the-art methods. The paper is organized as follows. In Section [III we present the relevant 
background and introduce notations for a general formulation of the estimator, based on the concept of 
Reference Observation Vector. In Section |llll we describe the proposed multivariate nonlinear estimator. 
In Section |IVl we give the specific form taken by this new estimator for multichannel images decomposed 
by a wavelet transform or an M-band dual-tree wavelet transform. In Section |Vl experimental results are 
given for remote sensing images showing that the proposed estimator outperforms existing ones and some 
concluding remarks are drawn in Section IVll 

Throughout this paper, the following notations will be used: let M be an integer greater than or equal 
to 2, Nm = {0,...,M- 1} and N]^ = {1,...,M- 1}; Z, M and M+ are the sets of integers, reals and 
positive reals; [.] denotes rounding towards the immediate upper integer. Besides, a denotes the Fourier 
transform of a function a, (5„,),„ez is the Kronecker sequence (equal to 1 if m = and otherwise), 
(/)^ = / if / > and otherwise and 1{A} = 1 if condition A is true and otherwise. 

II. Background 

A. General formulation of the multichannel estimator 

In multisensor imaging, B vectors of observed data samples (r(')(k))k£K, . . . , (r(^'(k))kg]K> are provided 
where B is the number of effective sensors and K is a set of spatial indices (K C 1?). Generally, these 
data correspond to noisy realizations of B unknown signals (^(''(k))^^!^, (5^*'(k))k(EK> respectively. 
Subsequently, our task will consist in devising methods to reduce the noise present in the observations. 
Two alternatives can be envisaged in this context. On the one hand, a monochannel approach builds 
an estimator s (k) of 5*'''(k) only from the observations (r('''(k))k(EK> for each channel b G {1,...,B}. 
On the other hand, a multivariate technique attempts to estimate 5'*'')(k) by accounting not only for 
the individual data set {r(^)(k)}k(EK, but also for the remaining ones {r(^)(k)}k(EK, {'■'^~'H'^)}k(£K> 

{r(''+i)(k)}keK, ...,{r(^)(k)}k6K. 

Thus, one of the simplest relevant denoising approach consists in calculating the estimated value s (k) 
of 5(k) as 

^^'|k)=/(rW(k)) (1) 

where / is a scalar function defined on the real line. For instance, a shrinkage function can be used, 
possibly involving some threshold value. Such a technique is commonly used in regression, when outliers 



February 2, 2008 



DRAFT 



5 



have to be removed in order to improve the representativity of the fit [28]. Although r('')(k) does not 
necessarily depend on other observed samples, for structured signal or image analysis, neighboring 
samples often present some correlations. Consequently, an improvement can be expected if s (k) is 
calculated with the help of a subset ^(^.^^'(k) of observed sample locations. Average or median filtering 
[29, p. 243-245] are examples where the estimated sample depends on its neighborhood. As a result, a 
more general estimation rule is: 

f}k)=/((rW(k')),,,^<.(,)). (2) 

With underlying Markovian assumptions, the context set {r^''\^')}^,^^{b)^^^ can be restricted to a limited 
number of values around the sample location k. These values can be gathered in a vector r^^'(k) which 
will be designated as the Reference Observation Vector (ROV). We have then 

^''|k)=/(r(^'(k)). (3) 

The multivariate case can also be described by such a formula if we allow the ROV to contain additional 
samples from the remaining channels in order to exploit the inter-component statistical dependencies. 

Another degree of freedom lies in the choice of a suitable domain for data representation. While 
virtually any transform can be chosen, special attention has been paid to multiscale transforms. For 
example, if a decomposition onto an M-band wavelet basis (M > 2) [30] is performed, the observed 
images are represented by coefficients ^^^^(k) defined at resolution level 7 > 1 and subband index m G 
and the corresponding ROV will be denoted ry''^(k). Since the noise is usually less correlated than the 
data, the DWT is applied in order to provide a sparser representation of the data of interest, before further 
anal 

5'y''^(k) of the original images: 



analysis [3], [4]. The goal becomes to generate estimates s /.m(k) of the unknown wavelet coefficients 



X 



{b) 



r,:„(k)=/(fg,(k)). (4) 

Then, the inverse DWT is applied to the estimated coefficients in order to reconstruct the estimated signal 

xH , 

s (k) in the spatial domain. In the literature concerning denoising, two key issues have been addressed. 
The first one lies in the definition of the ROV. The second one concerns the choice of an appropriate 
function / or, in other words, a suitable expression of the estimator. In the next subsection, we give a 
brief overview of the main ROVs proposed until now. 
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B. Reported ROVs in the DWT domain 

Popular componentwise methods operating in the DWT domain are Visushrink [5] and SUREshrink 
[7]. They both employ a very basic ROV reduced to a scalar value: 

fg,(k)=r;i(k). (5) 

Similarly to what can be done in the spatial domain, the wavelet coefficients can also be processed by 
block rather than individually, again in a mono-channel way [23], [24], [31]-[33]. The main motivation 
for this technique is to exploit the spatial similarities between neighboring coefficients in a given subband. 
The introduction of d —I spatial neighbors ki ,. . . , k^_i of the current sample indexed by k in the ROV 
allows to take into account the spatial dependencies: 

= [r;.i(k),r;.2,(kO,...,r;.i(k,_0]^. (6) 

For higher dimensional data, the ROV may also consist of coefficients sharing similar orientations, 
possibly within different scales [34]. Another generalization of the scalar case takes into account the 
inter-scale similarities between the current coefficient and the homologous ones defined at other scales. 
Based on empirical observations in image compression [35], it has been proposed to use the current 
coefficient ancestors at coarser scales 7 + 1, 7 +2, . . . , 7'm eventually up to the coarsest level / [36]-[38]: 
the ROV ry^^(k) thus includes the corresponding jn\ — j+ 1 coefficients at location k, in subband m, at 
resolution level j. 

In the case of multicomponent data, additional samples borrowed from the different channels can be 
included in the ROVs, as shown in [34], [39] for color image as well as for multispectral image denoising. 
Basically, the inter-component correlations can be taken into account through the following ROV [22]: 

f5i(k) = [r;i(k),...,r(^(k)]T. (7) 

Such a ROV includes all the coefficients of all channels at the same spatial location, in the same subband 
m and at the same resolution level j. In [25], a more sophisticated multicomponent ROV r^''^(k) has 
been defined which combines both spatial and multichannel neighbors. 

As particular cases, such an ROV encompasses the ROV in (|7]l and, also the ROV in ([6]). In addition, 
the ROV may include coefficients from different subbands. 

A final potential extension of the ROVs is related to the choice of the transform. Indeed, it has been long 
observed that a decomposition onto a wavelet basis suffers from a lack of shift-invariance as well as a poor 
directionality, resulting in denoising artifacts at low signal to noise ratios. A simple way for alleviating 
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these problems is to use a frame decomposition built from a union of wavelet bases. In particular, 
a number of papers [40]-[42] have demonstrated significant improvements in scalar shrinkage when 
resorting to a translation-invariant wavelet representation. The latter can be viewed as a decomposition 
onto a union of shifted versions of a unique wavelet basis. M-band dual-tree wavelet decompositions [27] 
constitute another example of a union of 2 (resp. 4) wavelet bases in the real (resp. complex) case. The 
corresponding mother wavelets are then derived from the first one by Hilbert transforms, which results 
in an improved directional analysis. For such frame decompositions, one can extend the notion of ROV 
to include samples produced by the different wavelet basis decompositions operating in parallel. These 
facts will be further developed to motivate the application of the general estimator proposed in this paper 
to an M-band dual-tree wavelet frame [27]. 

C. A unifying framework for shrinkage functions 

In the aforementioned works, the estimation is often performed by shrinkage, so exploiting the sparse- 
ness of the representation. The most well-known method was proposed in the pioneering works of Donoho 
and Johnstone [5]. The estimating function / is then given by 



for a soft thresholding with threshold value X > 0, where sign(-) is the signum function. Equivalently, 
by using the ROV in ([5]), the estimating function can be expressed as 



Some works [43] have focused on the improvement of the scalar shrinkage rule, yielding for instance 
smoother functions such as the garrote shrinkage based on [44], which is defined as: 



Several authors have proposed vector-like generalizations to the scalar shrinkage. Cai and Silverman 
[23], have proposed a block estimator which takes into account information on neighboring coefficients 
in each subband, as expressed as in This estimator dominates the maximum likelihood estimator when 
the block size is greater than 2. This method, named "NeighBlock", consists in applying the following 
shrinkage rule: 



/(r):':(k)) = sign(r^.:':(k))max{|r;:'^(k)| -^,0} 



(8) 




(9) 




(10) 




(11) 
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(b) 

where X> 0, d is the number of components in the ROV, ryj^(k) is a subpart of the ROV, s /.m(k) is 
the associated vector of estimated values, ||.|| denotes the classical Euclidean norm of M.^ and denotes 
the noise variance. Such a function is clearly reminiscent of the scalar garrote shrinkage defined in ( fTOl ). 
Based on an asymptotic minimax study, Cai and Silverman suggested appropriate values for X and d. 
They considered both overlapping and non-overlapping variants of this approach. In particular, the so- 
called "NeighCoeff" method corresponds to the case when s ;,m(k) reduces to a scalar estimate. Then, 
the corresponding estimating function is: 

(b) f\\^fmW-'^^^\ (b) 

V l|r;-m(k)P / + 

In the meantime, §endur and Selesnick [45] introduced a Bayesian approach allowing to model inter- 
scale dependencies between two consecutive levels. These authors consequently formulated the problem 
in the 2-band wavelet domain. In their approach, the ROV is given by f|''^(k) = [r|^^(k),ry^\ m(ril)]^' 
^f+i m( r|l ) being the "parent" of r^j'l^ik) (at the next coarser resolution). By considering as a prior model 
the non-Gaussian bivariate probability density function 



/'(4'i(k),4+i,m(r^l)) - exp(-^^|4>)|^ + |45,jr^l)p), a. > O (13) 
the following Maximum A Posteriori (MAP) estimator was derived: 

where the noise variance is again denoted by a^. 

More recently, in the context of signal restoration problems, Combettes and Wajs [46] have studied the 
properties of proximity operators corresponding to the solutions of some convex regularization problems. 
In particular, an interpretation of one of their results is the following. Let us adopt a Bayesian formulation 
by assuming that the vector r^^^(k) is a noisy observation of the vector s^^^(k) of multichannel coefficients 
at location k, embedded in white Gaussian noise with variance a^. Further assume that the vectors 
sfl^{k) are independent of the noise, mutually independent and have a prior distribution proportional to 
exp(— • II) with X>0. The MAP estimation of s^''^ is found by solving the optimization problem: 

It is shown in [46] that the minunizer of the MAP criterion is 



W /||fl(k)||-Xa 



f,LIM)- (16) 
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The three previous block-thresholding estimators have been derived from different perspectives and 
they have also been applied in different ways. However, it is possible to describe them through a general 
shrinkage factor tl^idlf where 

T / + 



Vxg: 



(17) 



and P > and X > take specific values in each of the aforementioned block estimators. We also remark 
that this generalized shrinkage obviously encompasses the soft and garrote thresholdings provided in ^ 
and 



III. Proposed nonlinear estimator 



A. Notations 



We will now propose a more general adaptive estimator that can be applied in any representation 
domain. We will therefore drop the indices j and m and we will consider the general situation where an 
observation sequence (f(k))kg22 of J-dimensional real-valued vectors (J € N, J > 1) is defined as 

VkGZ^, r(k) = s(k)+n(k), (18) 

where (n(k))k£22 is a fA£(0,r^"^) noise and (s(k))i5gZ2 is an identically distributed second-order random 
sequence which is independent of {n{k))]^^^2. We will assume that the covariance matrix F'^"^ is invertible. 
These random vectors are decomposed as 



rfk) 



r(k) 
f(k) 



s(k) 



.(k) 
s(k) 



nfk) 



n{k) 
n(k) 



(19) 



where r(k), ^(k) and ?i(k) are scalar random variables. We aim at estimating the first component ^(k) 
of the vector s(k) using an observation sequence (r(k))k£K where IC is a finite subset of Z^. We recall 
that, although ([TSl l does not introduce an explicit dependence between ^(k) and the vector f (k) of the 
last d —I components of r(k), such a statistical dependence may exist, due to the dependence between 
the components of s(k) themselves. The estimated sequence will be denoted by (5(k))k(EK- 



B. Form of the adaptive estimator 

In order to gain more flexibility in the denoising procedure, the following generalized form of shrinkage 



estimate will be considered: 



^(k)=ii,(||r(k)||P) qTr(k) 



(20) 
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where the function r\x{-) is given by ([TT] ) with X > 0, P > and q G M''. The vector q corresponds to a Hnear 
parameter. We notice, in particular, that if the threshold value X is set to zero, the considered estimator 
reduces to s (k) = q^r(k). This shows that linear estimators constitute a subset of the considered class 
of estimators. In addition, by an appropriate choice of the vector q, estimators consisting of a preliminary 
decorrelation of the data followed by a thresholding step also appear as special cases of the proposed 
estimator. Note that, in conventional multichannel data analysis, it is customary to decorrelate the data 
before processing. The most common examples are fixed channel conversions (like those from stereo to 
mono in sound processing or from RGB to luminance/chrominance components in color image or video 
processing). When the data modalities are less standardized (for instance in satellite imaging), adaptive 
methods such as the Karhunen-Loeve transform or Independent Component Analysis (ICA) [47] can 
be used. The latter adaptive transforms can also be performed in the transformed domain, e.g. in each 
subband. 

Furthermore, in order to limit the computational complexity in the implementation of the estimator, it 
can be useful to constrain the vector q to belong to some vector subspace of reduced dimension d' < d. 
Let P G M''^'' be the matrix whose column vectors form a basis of this subspace. We have then q = Pa 
where a G M'' . As a simple example, by choosing 

h' 
O 

where Id' denotes the identity matrix of size d' x d', we see that we only introduce in the estimator a 
linear combination of the first d' components of the vector r(k). In summary, the proposed form of the 
estimator is parameterized by X, |3 and a for a given choice of P. 



Our objective is to find the optimal parameters that minimize the quadratic risk defined as (3,a) = 
E[|5(k) — 5(k)p], for a predefined value of P. It is easy to show that the risk reads: 

/?(X,P,a) = E[|.(k)-^(k)|2] 

= E[|.(k)|2] + E[|ii,(||r(k)||P)aTpTf(k)|2]-2E[m(l|r(k)||VP^f(k).(k)]. (21) 

The minimization of the risk is not obvious for any observation model. Indeed, since the s(k) are unknown, 
it seems impossible to express the rightmost term E[ri;(.(||r(k)||P)a^P^r(k)5'(k)]. However, in the case of 
a Gaussian noise, it is possible to apply an extension of Stein's principle [19] for deriving an explicit 
expression. In the next subsection, we will state and prove such extended Stein's formula. 
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C. Stein's formula 

Proposition 1: Let / : M'^ — > M be a continuous, almost everywhere differentiable function such that 

(t-e)T(r("))-i(t-e) 



VeeR"', Um /(t)exp 



E[|/(r(k))|2]<+oo and 



2 

I ar(k) I 



0; 



< +00. 



Then, 



E[/(r(k)).(k)] = E[/(r(k))r(k) 



Proof: Let T 



ar(k) 



E[nn]. 



be a continuous, almost everywhere differentiable function such that 

(t-e)T(r("')-i(t-e) 



VeeM'', lim T(t)exp 

||t|H+oo 



E[||T(r(k))f] <+oo and E 



2 

,3T(r(k)) 



0; 



< +00. 



(22) 
(23) 

(24) 

(25) 
(26) 



where || • ||f is the Frobenius norm. In this multivariate context. Stein's principle [19] can be expressed 
as 

(27) 



E[T(r(k))sT(k)] = E[T(r(k))rT(k)] - E[^If^]r(^'. 



ar' (k) 

Eq. ((24)) follows by choosing T : t h-> [/(t),0, . . . ,0]^ and focusing on the top-left element of matrix 
E[T(r(k))sT(k)]. ■ 



D. Risk expression 

We define the function / : u i— > ri;^(||u||P) a^P^u. It is easy to check that this function / satisfies the 
conditions of Prop. [T] Consequently, the last term can be calculated thanks to (l24l ). This yields 



E[.(k)/(r(k))] = E[r(k)/(r(k))]-E 



df{f{k)) 



dr{k) 



p{n,«) 



(28) 



where T^""' = E[n(k)n(k)]. We then have 



ar(k) 



ar(k) 



ar(k) 



f(k)f > X}^^5?^fT(k)Pa + Ti,(||r(k)||P)Pa 



~ ||r(k)||P+i^'"^'""' " af(k) 
= il,(||r(k)||P)Pa + Xr(k)^^(k)Pa (29) 

where ^(k) = 1{|| r(k)||P > X} ij^Jp^ r (k) . This leads to the following expression of the risk: 

7?(X,P,a) = E[|r(k)-/(r(k))|2] + 2E[iix(||r(k)||P)]aTpTr("-) + 2XaTpTE[^^ (30) 

where = E\\n(k)\\ 
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We will now look for parameters X, P and a that minimize the risk expression (l30l) for a given choice 
of P. 

E. Determination of the parameter a 

We first aim at calculating the value of a that minimizes the risk (l30l ). By noticing that the risk is a 
quadratic convex function of a, the minimization can be performed by differentiating w.r.t. a and then 
finding a*(X,|3) such that a7?/aa(X,(3,a*(X,p)) =0. It readily follows that 

a*(X,|3) = (pTE[ii2(||r(k)f)f(k)rT(k)]P)-'pT(E[Ti^||f(k)||P)r^^^^^ 

- E[Ti,(||r(k)||P)]r("'") -XE[^(k)rT(k)]r("'")). (31) 

F. Determination of the parameters X and |3 

Starting from (l30l ). the risk 7?(X,|3,a) can be re-expressed as 7?(X,P,a) = E[p;^ [5 a(k)] where 

P;.,p,a(k) =«2(k)X2 + ai(k)X + ao(k) (32) 

and 

ao(k) = r\k)-a^ + l{\\ r(k) f > ^ija^P^ (2 r'"'") + (a^P^r (k) - 2 r (k) ) r(k) 



|r(k)||P 

a,(k) = l{||r(k)f>M^'"''"'^''^^ 



2 



|r(k)fP • 

In practice, under standard mixing assumptions for (iT(k))i,gZ2 and (s(k))j.£22 [48], 7?(X,P,a) can be 

A 

estimated via an empirical average R(k, P,a) computed over K, provided that the data length K = card(]K) 
is large enough. Following a procedure similar to the search implemented for the SUREshrink estimator, 
we will subsequently determine optimal values of X and P for this consistent risk estimate. More precisely, 
the norms of the ROVs (||f(k)||)keK are first sorted in descending order, so that ||r(ki)|| > ||r(k2)|| > 

_ A 

... > ||r(k/f)||. To study the variations of P,a) w.r.t. X, we consider the case when X G /,(, with 
io G {1,...,^+1} and 

|r(ki)||P,oo) if/o = l 

4 = <; [||r(kJ||M|r(k,„_Of) if /oG{2,...,^} (33) 
[0,||r(kjf)||P) if/o=i^+l. 
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On the interval /,„, the risk estimate then takes the following form 

^(X,P,a) = l('|£'px,p,a(k,) + Lpx,p,a(k,)) (34) 



K \ . , 

1=1 l = lo 

= i E «2(k,) + X £ ai (k,) + ao(k/) + f ^^(k,) -{K+l- io)a^) . (35) 

^ i=l 1=1 1=1 I=!o 

A 

In other words, |3,a) is a piecewise second-order polynomial function of X. Assume now that /q G 
{2, . . . ,K}. For given values of P and a, the minimum over R of the polynomial in (l35l ) is reached at 

i:r^^«i(k,) 

2i::::;'a2(k 

The minimum over [||r(k/J||P, ||r(k/„_i)||P] of the estimated risk is therefore given by 



X,„(P,a) if [||r(k,„)||P<X,„(|3,a)<||r(k,_i)f 
= <! ||r(k,)||P if ?.,„(P,a) < ||r(k,)||P (37) 

|r(k,_0||P ifl,„(P,a)>||r(k,_i)||P. 
The minimizers Xj((3,a) and X^^j(P,a) of the estimated risk over and I^+i can be found in a similar 
way. The global minimizer X*(|3,a) of the estimated risk is subsequently computed as 

r(|3,a) = arg min ^(X* (|3,a),P,a). (38) 

To determine the optimal value (3* (a) of the exponent P, we can then proceed to an exhaustive search 
over a set 1/ of possible values for this parameter by choosing 

r(a) = argmin^(r(p,a),|3,a). (39) 

In our experiments, it was observed that a restricted set of a few search values is sufficient to get good 
results. 

G. Iterative optimization algorithm 

The optimal expression of the vector a is derived in a closed form in Section IIII-EI as a function of 
the parameters X and p. In this way, the optimization problem simply reduces to the determination of 
the latter two parameters. On the other hand, given a, a procedure for determining the optimal values of 
X and P is described in Section IIII-FI In order to get optimized values of the estimator parameters, we 
therefore propose to apply the following iterative optimization approach: 

'We adopt here the convention ^^^j • = Y.f=K+l ' ~ 
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1) Initialization: Fix P and 1^. Set the iteration number p = I and a^") = [1,0, .. . ,0]^ G M.'' 

2) Iteration p 

a) Set pfp' = P*(a(^'-i)) and X^^') = X*((3(^),a(''-')) as described in Section [IIITl 

b) Set a(/^' =a*(X(^),p('^)) using (|3TI ) where the expectations are replaced by sample estimates. 

3) Set p ^ p + l and goto step |2] until convergence. 

4) Return the optimized values (X*'^',|3'''',a'''') of the parameters. 

We point out that, although we were not able to prove the convergence of the optimized parameters, the 
generated sequence (7?(X'''',P'''',a'''''))p is a decreasing convergent sequence. This means that improved 
parameters are generated at each iteration of the algorithm. 

IV. MULTICOMPONENT WAVELET DENOISING 

Our objective here is to apply the nonlinear estimator developed in the previous section to noise 
reduction in degraded multicomponent images by considering wavelet-based approaches. The original 
multichannel image is composed of B G N* components 5^^' of size L x L, with b G {l,...,B}. Each 
image component s^''^ is corrupted by an additive noise n^''\ which is assumed independent of the images 
of interest. Consequently, we obtain the following noisy observation field A^'> defined by: 

Vk G K, A''^ (k) = 5^^) (k) + (k) , (40) 

where IfC = {1, . . . ,L}^. Following a multivariate approach, we define: 

' S(k) = [s(l'(k),...,s(*)(k)]T 

VkGK, J n(k) = [«(i)(k),...,n(«)(k)]T . (41) 
^ r(k) 4 [(r(i)(k),...,r(^)(k)]T 
Obviously, the observation model (l40l ) can be rewritten as Vk G K, r(k) = s(k) + n(k). In many 
optical systems, the noise stems from a combination of photonic and electronic noises cumulated with 
quantization errors. Subsequently, we will assume that the noise vector process n is zero-mean iid Gaussian 
with CO variance matrix F^"'. In [1] and [2], this was shown to constitute a realistic assumption for satellite 
systems. It is worth noticing that a non diagonal matrix F'"' indicates that inter-component correlations 
exist between co-located noise samples. 

Hereafter, we will use two decompositions. The first one consists in a critically decimated M-band 
wavelet transform whereas the second one, corresponds to an M-band dual-tree wavelet decomposition 
we recently proposed [27] which permits a directional analysis of images. 
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A. M-band wavelet basis estimation 

1) Model: We first consider an M-band ortlionormal discrete wavelet transform (DWT) [30] over J 
resolution levels applied, for each channel b, to the observation field This decomposition produces 



— 1 wavelet subband sequences rjl^, m G {(0,0)}, each of size LjX Lj (where Lj = L/M^fl at 
every resolution level j and an additional approximation sequence rf^ of size Lj x Lj, at resolution level 



Jb) 



iid.^(0,B("') 



,(») 

'j.m 



(/b) H(i,), 



r(b) 



{b) L 



iid5v:(0,B("l) 



rib) 



(*) 
n- 



Fig. 1. Considered models in the wavelet transform domain (left) and in the dual-tree transform domain (right). 

On the one hand, the linearity of the DWT yields (see. Fig. [T|l: Vk e Kj, r, i„(k) = s,,n,(k) +ny n,(k) 
where IfCy = { 1 , . . . ,Lj}^ and 



sj,M=[s%{k),...y;:M]\ 

n;>(k) = [4i(k),...,n^(k)]T. 

On the other hand, the orthonormality of the DWT preserves the spatial whiteness of nj,m- More 
specifically, it is easily shown that the latter field is an i.i.d. iA£(0,r'"') random vector process. 
A final required assumption is that the random vectors (s/.m(k))t.£K are identically distributed for any 
given value of {j,m). 

2) Associated estimator: As described in Section |llll our estimator can be directly applied to the 
M-band DWT coefficients. As in conventional approaches, the approximation coefficients {i.e. j = J and 
m = (0,0)) are kept untouched. The parameters Xj^m, P/.m and q^ m can be determined adaptively, for 
every subband {j,m) and every component b. In this case, the ROV can be scalar, spatial, inter-component 
or combined spatial/inter-component. More detailed examples will be given in Section |Vl 



^For simplicity, L is assumed to be divisible by M-'. 
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B. M-band dual-tree wavelet frame estimation 
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Fig. 2. Pair of analysis/synthesis M-band para-unitary filter banks. 
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STEP 1 



Fig. 3. Dual- tree 2D. 

1) A brief overview of the decomposition: The M-band real dual-tree transform (DTT) consists in 
performing two separable M-band orthonormal wavelet decompositions in parallel as illustrated by Fig. 
m The one-dimensional wavelets (\|/m)m(£Nj, corresponding to the primal tree (upper branch) are assumed 
known and the "dual tree" ones {^m)meN*„ (used in the lower branch) are built so that they define Hilbert 
pairs with the primal ones. This reads in the frequency domain: Vm G N]^, \j/m((o) = — Jsign(a))\j/,„((o). 
Details of construction are given in [27] and the global scheme of the decomposition is shown in Fig. |3] 
An important point is that the dual-tree decomposition includes a post-processing, consisting of a linear 
isometric combination of the primal/dual subbands (see Fig. [S])- This post-processing constitutes an 
essential step for obtaining a directional analysis. Finally, two sets of coefficients (primal and dual ones) 
are obtained, which means that this representation involves a limited redundancy of a factor two. 
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2) Model: Applying this decomposition to a multichannel image having B components and using 
similar notations to Section lTV-A.ll we obtain the following coefficients for the original data, the observed 
ones and the noise, respectively: 

. before post-processing: (S;- m(k),s^^(k)), (r/,m(k),r^m(k)), (n^- m(k),ii^„(k)); 

. after post-processing: (v,-m(k),v5'^(k)), (uy,m(k),u^„(k)), (w;,m(k),w5'^(k)). 
Note that a post-processing is not applied to all subbands (see [27]) as the Hilbert condition is only 
verified by mother wavelets. As a consequence, the linear isometric combination is not performed for 
subbands processed by low pass filters. More precisely, the post-processing consists of the following 
unitary transform of the detail coefficients: for all m € N]^, 

Vk G Kf, Wj,M = ^ (n„m(k) + n«^(k)) and w«^(k) = -L (n,^(k) - n«^(k)) . (42) 

Similar relations hold for the original and observed data. Furthermore, invoking the linearity property of 
the transform, these coefficients are related by (see. Fig. [T] (right)): 

Vk e Kj, ry,m(k) = Sj,m(k) + n;,m(k) and u,-,m(k) = v;,m(k) + Wj,m(k) 

rfm(k) = s«„(k) +nH^(k) u«„(k) = vf^lk) + w«„(k). (43) 

3) Noise statistical properties: In our recent work [49], [50], a detailed analysis of the noise statistical 
properties after such a dual tree decomposition has been performed. In the sequel, some of the main results 
we obtained are briefly summarized. Let us recall the definition of the deterministic cross-correlation 
function between the primal and dual wavelets: for all {m,m') G N^, 

Vx G M, Y„,,„y (x) = [ \\t,„ {x)\\i^, {x-x)dx. (44) 



■ r*"' 5„„ 5,„2 _,„^ Sj., ^k2-k'2 



We have obtained the following expressions for the covariance fields: for all 7 G Z, m = {mi,m2) G N^, 
m' = (m'i,m'2) G N^, k = {ki,k2) G Kj and k' = ik[,k'2) G Kj, 

EK„(k)(n,-,™:(k'))T] 

E[n«^(k)(n«„,(k'))T]_ 

E[n;,„(k)(ii«„,(k'))^] = r(")y,„„,„; {k[ - fci)y,„„„,^(4 - h) . 

It can be further noticed that, for m 7^ 0, the random vectors n/ ,n(k) and ny'^(k) at a given location k 
are mutually uncorrelated. 

After post-processing, the covariances of the transformed noise coefficient fields can be easily deduced 
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from (02]): for all (m,m') G N*^ and (k,k') G K^, 

E[w,-„(k)(w,,™.(k'))^ =E[n,,^(k)(n,™,(k'))T] + E[n,™(k)(n«„,(k'))T] (45) 

E[w5^„(k)(w«„,(k'))T] =E[n,,^(k)(n,,^,(k'))T] - E{n,„(k)(n«„,(k'))T] (46) 

E[w,-„(k)(w«„,(k'))T]=0. (47) 



In summary, noise coefficients are inter-tree correlated before the post-transform whereas after the post- 
transform, they are spatially correlated. This constitutes an important consequence of the post-processing 
stage. 

4) Associated estimator: In the M-band DTT case, the primal and dual coefficients are both estimated. 
For each component b G {I,... ,B}, the estimator reads: for the subbands which are not linearly combined 
(m ^ N^,), 

t^'Lck) = 11,., (Ilril(k)lie) {qfl,Vrfi{k) (48) 

f^i).) =Tl,„<.(||(r^!^'(k))||C'') (q«L^yrS'(k), (49) 

and, for the combined subbands (m E N;^^), 

f-M=^^.. (Iluil(k)lie) (q;.i)Tu(l(k) (50) 

=il,H.(||(u«|^'(k))f"^') (qSyu»|^'(k), (51) 

where ry|^^(k) and r^|^'(k) (resp. Uy''^(k) and u^|^'(k)) are the ROVs for the primal and dual coefficients 
before (resp. after) post-transformation. Similarly to the DWT case, (^/,m,P;,m,q/,m) and {y^^m^^^.m^^.m) 
can be adaptively determined by minimizing the quadratic risk over the frame coefficients for every 
subband (7,in) and every component b in each tree. Furthermore, the approximation coefficients are 
also kept untouched. The denoised multichannel images are then obtained from the estimated wavelet 
coefficients by inverting the DTT using the optimal reconstruction developed in [27]. In this case, a 
great flexibility exists in the choice of the ROV since the latter can be scalar, spatial, inter-component, 
inter-tree or combined spatial/inter-component/inter-tree as will be illustrated in the next section. 

V. Numerical results 

We now provide numerical examples showing the efficiency of the proposed method. In our simulations, 
we consider different multichannel remote sensing images. For the sake of clarity, we only provide 
experimental results concerning two multispectral images. The first one designated as Tunis corresponds 
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to a part of a SP0T3 scene depicting a urban area of the city of Tunis {B = 3). The second one named 
Trento is a Landsat Thematic Mapper image having initially seven channels. The thermal component (the 
sixth component) has been discarded since it is not similar to the remaining ones. Hence, the test image 
Trento is a B = 6 component image. In order to obtain reliable results from a statistical viewpoint, Monte 
Carlo simulations have been conducted. According to our experiments, averaging the mean square error 
over five noise realizations is sufficient to obtain consistent quantitative evaluations. 

In the following, we discuss several topics: in particular, we compare our method with other recently 
proposed estimators, possibly having a multivariate structure. Then, we consider different pre-processings 
that can be performed on the multichannel data before applying the estimator, thus expecting improved 
results. The ROV being defined in a generic way in the previous section, we also study the influence 
of specific choices of this ROV on the denoising performance as well as the influence of the wavelet 
choice (considering various M-band filter banks). When different decompositions are performed, we 
set the maximum decomposition level so that the size of the approximation fields remain the same. 
Consequently, we decompose the images over 2 levels for a 4-band filter bank structure and 4 levels for 
a dyadic one. 

If a^*' denotes the standard deviation of the clean multichannel component s^'''' (of size Li x L2) we 
define the initial and the final signal to noise ratios SNRj^^jj^j and, SNRg'^l^j in the b-th channel as: 

SNR!l,^101og,,,(|g^^). and SNR£)^,01og,.(|^gl|). (52) 

Then, all the B channel contributions are averaged into global values of the initial and final signal to 
noise ratio SNRinitiai and, SNRfinai. 

A. Comparison with existing methods 

We aim in this section at comparing the proposed approach with several existing denoising methods 
which are briefly described in Table Jl Tests are performed on a 512 x 512 SP0T3 image of Tunis city 
(B = 3) (as some multivariate methods are limited to 3-band images) corrupted by an additive zero-mean 
white Gaussian noise with covariance matrix fJ"' = Is, where Ib denotes the identity matrix of size 
BxB. 

We first study techniques that use orthogonal wavelet transforms. We employ Daubechies wavelets of 
order 4 in all the following estimators: 

1) the Bivariate shrinkage, which takes into account inter-scale dependencies, the last level being 
processed by inverting children and parent role [45]; 
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TABLE I 

Brief description of the tested methods. 



Acronym 


Description 


Ref 


Acronym 


Description 


Ref 


Biv. 


Bivariate shrinkage method 


[45] 


Multivariate methods 


BLS-GSM 


Bayesian Least Squares (BLS) 
Gaussian Scale Mixture (GSM) 
using critically decimated DWT 


[34] 


ProbShrink 

(.X.) 


Multivariate method for 3-band images using 
critically decimated DWT and taking into 
account a (. x .) neighborhood in each channel 


[39] 


BLS-GSM 
+ parent 


BLS-GSM using critically 
decimated DWT and taking into 
account the parent coefficient 


[34] 


ProbShrink 
red. (. X .) 


Multivariate method for 3-band images 
using undecimated DWT and taking into 
account a (. x .) neighborhood in each channel 


[39] 


BLS-GSM 
red. 


BLS-GSM using a full 
steerable pyramid 
(redundant transfoiTn) 


[34] 


Surevect 


Estimator based on an extended SURE 
approach using a critically decimated DWT 


[22] 


Curvelets 


Block estimator using curvelet 
transform: 7.5 times redundant 


[51] 





2) the BLS-GSM method developed in [34] including or not the parent neighborhood and considering 
a 3 X 3 spatial neighborhood!^ 

3) the ProbShrink estimator [39] for multivariate data with a 3 x 3 spatial neighborhood (in each 
channel) c 

4) the Surevect estimator [22], which only takes into account multicomponent statistical dependencies; 

5) the proposed estimator where the set of values taken by (3^''^ is 1^ = {0.5,1,1.5,2}, the ROV is 
represented in Fig. |5lb). A subspace constraint is added on the vector q^*^ so that {<lfm)^'rfl^{k) 
reduces to a linear combination of the multichannel data at the considered location and the 4 spatial 
nearest neighbors. 

The obtained results are provided in Table HIl (the initial SNRs may be different in each channel although 
the noise variance is fixed). For the first three methods, denoising has been performed for each component 
of the multichannel data. For orthogonal wavelets, ProbShrink leads to better results when it is associated 
to a spatial neighborhood than when considering only the pixel value to be estimated. It performs quite 
similarly to the Bivariate shrinkage. The BLS-GSM estimator outperforms these two methods providing 
a gain of approximatively 0.2 dB (up to 0.3 dB by including the parent coefficient in the neighborhood). 
Nevertheless, the Surevect estimator brings more significant improvements and it can be observed that 

^We use the toolbox available from Portilla's website http://www.io.csic.es/PagsPers/JPortilla/. 
^We use the toolbox available from Pizurica's website http: //telin . rug . ac.be/'-^san ja/. 
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TABLE n 

DENOISING results (average values computed over 3 CHANNELS) ON TUNIS IMAGE USING NON REDUNDANT 
ORTHOGONAL TRANSFORMS (SEE TAB.[I1| WITH DAUBECHIES WAVELETS OF ORDER 4 (LENGTH 8). 



<5~ 




Biv 


Probshrink 


BLS-GSM 


BLS-GSM 


Surevect 


Proposed 








(3x3) 




+ parent 


DWT 


DWT 


650.3 


5.081 


11.85 


11.86 


12.05 


12.14 


13.08 


13.42 


410.3 


7.081 


12.89 


12.84 


13.11 


13.21 


14.12 


14.52 


258.9 


9.081 


13.99 


13.91 


14.26 


14.36 


15.24 


15.70 


163.3 


11.08 


15.19 


15.08 


15.49 


15.60 


16.43 


16.96 


103.1 


13.08 


16.49 


16.37 


16.81 


16.93 


17.70 


18.28 


65.03 


15.08 


17.88 


17.54 


18.22 


18.35 


19.04 


19.65 



our method leads to even better numerical results whatever the initial noise level is. The new structure 
of the estimator coupled with a spatial and spectral block processing may explain such an improvement. 
Furthermore, the gain increases as the initial SNR increases, which is interesting in satellite imaging 
where the noise is often of low intensity. To be fair, we would like to mention that, although Bivariate 
shrinkage, Probshrink and BLS-GSM were designed for monochannel image denoising, extensions of 
these methods to the multivariate case could probably be envisaged. 

In the monochannel case, it has been reported that the use of redundant transforms often brings 
noticeable improvements in denoising [51]. We subsequently compare methods that have been proved to 
be very efficient when combined with a redundant analysis: 

1) the curvelet denoising [511 using a curvelet frame with a redundancy approximatively equal to 7.5 
and a block thresholding |f 

2) the BLS-GSM method using steerable pyramids with 8 orientations, including the parent neighbor- 
hood and a 3 X 3 spatial neighborhood as described in [34], 

3) the Probshrink estimator for multivariate data using undecrmated wavelet transform [39] (with 
Daubechies wavelets of length 8) and taking into account a 3 x 3 or no spatial neighborhood; 

4) the Surevect estimator [22], extended to DTT (with Daubechies wavelets of length 8); 

5) the proposed estimator using a DTT where V = {0.5, 1, 1.5,2}, the ROV is represented in Fig.[6tb). 
The vector q^''^ (resp. q^l^') is such that it introduces a linear combination of the multichannel 
data in the primal (resp. dual) tree at the considered location and the 4 spatial nearest neighbors. 

^We employ the Curvelab 2.0 toolbox which can be downloaded from http://www.curvelet.org. 
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TABLE m 

DENOISING results (average values computed over 3 CHANNELS) ON TUNIS IMAGE USING REDUNDANT 
TRANSFORMS (SEE TAB.[T1| WITH DAUBECHIES WAVELETS OF ORDER 4 (LENGTH 8). 



a2 


SNRi„it 


Curvelets 


BLS-GSM red 


Probshrink red 


Probshrink red 


Surevect 


Proposed 








+ parent 


(3x3) 


(1x1) 


DTT 


DTT 


650.3 


5.081 


11.91 


12.92 


13.00 


13.33 


13.54 


13.72 


410.3 


7.081 


12.94 


14.00 


14.04 


14.38 


14.59 


14.80 


258.9 


9.081 


14.04 


15.15 


15.13 


15.50 


15.70 


15.97 


163.3 


11.081 


15.17 


16.38 


16.28 


16.68 


16.87 


17.21 


103.1 


13.081 


16.33 


17.68 


17.51 


17.92 


18.11 


18.52 


65.03 


15.081 


17.56 


19.04 


18.76 


19.20 


19.42 


19.88 



It is worth pointing out that the same noisy images as used in the non redundant case have been processed 
by the redundant transforms. As shown in Table |llll curvelets do not seem really appropriate in this 
multichannel context in spite of their promising results in the monochannel one. ProbShrink and BLS- 
GSM methods are very efficient in the redundant case and ProbShrink shows its superiority when using an 
inter-component neighborhood. The methods using a DTT outperform the existing ones in all the cases. 
We point out that the DTT has a limited redundancy of a factor 2 compared with the other considered 
redundant decompositions. It can be noticed that our method provides better results than Surevect. The 
observed gain increases as the initial SNR increases and we obtain significant improvements with respect 
to critically decimated transforms of about 0.25 dB. It is also interesting to note that the observed gain 
in terms of SNR leads to quite visible differences. In Fig. |4l cropped versions of the first channel of the 
Tunis image are displayed, for a low value of the initial SNR (4.66 dB). We can notice that the proposed 
method (see Fig. |4l-(f)) allows to better recover edges whereas the three others (see Fig. |4l-(c,d,e)) result 
in more blurred images, where some of the original structures are missing. This is especially visible for 
the image denoised with the BLS-GSM estimator (see Fig. |4l-(d)). 

In the following, we focus on the method introduced in this paper and more specifically on the variations 
of its performance according to the parameter setup. 

B. Pre-processing stage 

In order to improve the denoising performance in the multichannel context, additional linear procedures 
can be applied. Actually, different linear pre-processings of the components may be envisaged: 
• The simplest idea consists in decorrelating the spectral components of the image to be estimated in 
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(d) (e) (f) 



Fig. 4. Cropped versions of Tunis image (channel b= I, initial SNR equal to 4.66 dB) and (a) Original image, (b) Noisy image, 
(c) Denoised image using ProbShrink red. (1 x 1), (d) Denoised image using BLS-GSM red. + parent method, (e) Denoised 
image using curvelets and (f) Denoised image using our method (employing a DTT). 



order to process them separately. Knowing the noise covariance matrix F^"', we can deduce the original 
data covariance matrix (assumed here to be spatially constant): F^*^ = F*"^^ — F*"^, from the observed 
data covariance matrix F^"^'. More precisely, by performing an eigendecomposition of F^^^, we seek 
for an orthogonal matrix U^*' such that: F^^^ = U(*'D(*'(U(^))^ where D^^^ is a diagonal matrix. Then, 
the transformed multichannel image is ((U^^')^r(k))k and it is corrupted by a spatially white zero- 
mean Gaussian noise with covariance matrix (U'*^)^F'"'u^*'. We then proceed to the nonlinear wavelet 
estimation of the decoiTclated components as described in the previous sections. 

• Instead of decorrelating the components, we may try to make them statistically independent or, at 
least, as independent as possible. A number of ICA (Independent Component Analysis) methods have 
been developed for this purpose in recent years [47]. In this case, a linear transform V^^^ (which is not 
necessarily orthogonal) is applied to the multichannel data. 

The proposed estimator already includes an optimized linear combination of some of the components 
of the ROV. It is therefore expected to provide competitive results w.r.t. techniques involving some linear 
pre-processing. In order to make fair comparisons and evaluate the improvements resulting from the 



February 2, 2008 



DRAFT 



24 



optimization of the linear part of the estimator, we provide simulations where the ROV is the same 
whatever the pre-processing is (we have chosen the same ROV as in the previous sections). In addition, 
when a decorrelation or an ICA is employed, the linear part of the estimator is chosen equal to the 
identity. We finally propose to compare these results with a simple linear MSE estimator based on a 
linear combination of coefficients from different channels. 

Numerical results displayed in Table |IV] allow us to evaluate the proposed approach without optimiza- 
tion of the linear parameter vector, the same estimator combined with an ICA of the multichannel data 
(using the JADE algorithm [47]) or a pre-decorrelation stage and, finally our approach with an optimized 
linear part. From these results, it is clear that including some linear processing is useful for multichannel 

TABLE IV 

Influence of different pre-processings on Tunis image denoising (o^ = 258.9). Symlets of length 16 are 



USED. 



Transform 


Channel 


SNRinit 


Without transf. 


ICA 


Decorrelation 


MSE Lin. 


Opt. lin. 




b=l 


8.664 


13.84 


14.66 


15.15 


15.18 


15.75 


DWT 


b = 2 


9.653 


14.39 


15.03 


15.36 


15.28 


15.90 




b = 3 


8.926 


15.15 


13.85 


15.11 


15.26 


15.86 




Average 


9.081 


14.46 


14.51 


15.21 


15.24 


15.84 




b=l 


8.664 


14.13 


14.37 


15.42 


15.42 


15.95 


DTT 


b = 2 


9.653 


14.66 


14.67 


15.64 


15.53 


16.10 




b = 3 


8.926 


15.38 


14.26 


15.25 


15.52 


16.01 




Average 


9.081 


14.72 


14.43 


15.44 


15.49 


16.02 



image denoising. The ICA only brings slight improvements, possibly due to the fact that the associated 
transform is not orthogonal. Pre-decorrelating the data significantly increases the SNR, however the fully 
optimized version of our estimator remains the most effective method. 

C. Influence of the neighborhoods 

The ROV can be defined as desired and plays a prominent role in the construction of our estimator. 
We study here the influence of different choices of the ROV: 

1) ROVl corresponds to an inter-component neighborhood. When a DWT is employed (see Fig. I2a)), 
we have r^''^(k) = [(''y-^m(k))^,]^, while for a DTT (see Fig. Oa)), we use 

= [&))y , [rf^O^)),]" and n^^.) = (53) 
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2) R0V2 corresponds to a combination of a spatial 3x3 and an inter-component neighborhood as 
considered in the previous sections and shown in Figs. Ob) and Sb). 




(a) (b) 



Fig. 5. Representation of tlie different considered ROVs in the DWT domain (tlie black triangle will be estimated taking 
into account the white ones); (a) ROVl the purely inter-component one and (b) ROV2 combining inter-component and spatial 
dependencies. 

The Unear part of the estimator is defined as in Section |V-A[ 
The corresponding results are given in Table |Vl 

TABLE V 

Influence of the neighborhood in Tunis image denoising (average values computed over 3 channels are 
provided and = 258.9) using symlets (length 16) (top) and ac filter bank (length 16) (bottom). 



Transform 


SNRin,, 


ROVl 


ROV2 


Transform 


SNRin,t 


ROVl 


ROV2 


DWT (symlets) 
DTT (symlets) 


9.081 
9.081 


15.42 
15.77 


15.84 
16.02 


DWT (AC) 
DTT (AC) 


9.081 
9.081 


15.48 
15.88 


15.77 
16.02 



In order to compare different possible wavelet choices, the results are provided both for symlets of 
length 16 and a 4-band filter bank given in [52] which is denoted by AC. These results can also be 
compared with the ones given in Section IV- A I where Daubechies filters of length 8 are used. 
Concerning the neighborhood influence, we note that taking into account spatial dependence leads to a 
significant improvement w.r.t. inter-component dependence. In all cases, combining spectral and spatial 
neighborhood leads to the best results. 
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Post— processing Post-processing 



(a) (b) 

Fig. 6. Representation of thie different considered ROVs in tlie DTT domain, witli and without post-processing stage (the 
blaclc triangle will be estimated taking into account the white ones); (a) ROVl the purely inter-component one and (b) R0V2 
combining inter-component and spatial dependencies. 

Concerning the wavelet choice, it appears that the 4-band AC wavelets yield slightly better results than 
the dyadic symlets choosing ROVl and equivalent results choosing ROVS. Both outperform Daubechies 
wavelets wathever the ROV chosen. 

D. Various noise levels 

In this section, we consider that the image channels are corrupted at different noise levels. Thus, the 
noise is spatially white, zero-mean, Gaussian with covariance matrix Fj"^ = Diag(ai, . . . ,C7|). 

TABLE VI 

Denoising results on Tunis image considering Tj"' and using symlets (length 16). 



Channel 





SNRinit 


Surevect 


Proposed DWT 


Surevect DTT 


Proposed DTT 


b=\ 


25.89 


18.66 


20.58 


21.15 


20.84 


21.24 


b = 2 


258.9 


9.653 


18.53 


18.63 


18.76 


18.84 


b = 3 


491.9 


6.138 


14.20 


14.57 


14.55 


14.71 


Average 




11.49 


17.76 


18.12 


18.05 


18.26 



The resulting numerical results are displayed in Table |Vl] with the corresponding noise levels, when 
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our estimator is used with R0V2. Noticeable differences can be observed when comparing Surevect with 
our method both considering DWT and DTT transforms. 

E. Increased number of channels 

A strong advantage of the proposed method is that, unUke many multicomponent approaches Umited to 
RGB (3 components) images, it may process any kind of multichannel images, whatever the number of 
channels is. We consider here the 6 channel Trento image. We apply the Surevect estimator (both using 
DWT and DTT), the BLS-GSM estimator (taking into account the parent coefficient), and our estimator 
using R0V2. From the results provided in Table |VII[ we see that, while the number of channels is 

TABLE VII 

Results obtained applying different estimators on Trento image (o^ = 258.9). 



Channel 


SNRin,t 


Surevect 


Proposed 


BLS-GSM red 


Surevect 


Proposed 








DWT 


-1- parent 


DTT 


DTT 


b=\ 


-2.907 


8.661 


8.945 


8.311 


8.984 


9.255 


b = 2 


-6.878 


8.375 


8.430 


6.536 


8.805 


8.876 


b = 3 


-3.836 


8.288 


8.430 


7.341 


8.647 


8.749 


b = 4 


2.428 


9.525 


9.796 


9.836 


9.901 


10.00 


b = 5 


4.765 


11.18 


11.53 


11.38 


11.61 


11.78 


b = 6 


-1.560 


9.545 


9.700 


8.167 


9.945 


10.02 


Average 


-1.331 


9.262 


9.472 


8.596 


9.649 


9.780 



increased, our method still outperforms the other ones especially when a DTT is used. With the increase 
of the number of channels, the reduced redundancy of the DTT becomes another attractive feature of the 
proposed approach. 

VI. Conclusion 

In this paper, we have proposed a nonlinear Stein based estimator for wavelet denoising of multichannel 
data. Due to its flexible form, the considered estimator generalizes many existing methods, in particular 
block-based ones. Although the proposed approach has been applied to satellite images, it could also be 
used in any multivariate signal denoising problem. Besides, the estimator has been used in conjunction 
with real dual-tree wavelet transforms but complex ones or other frame decompositions could be envisaged 
as well. In the context of frame representations, it should however be noticed that the proposed estimator 
minimizes the risk over the frame coefficients and not on the reconstructed signal, which may be 
suboptimal [21], [53]. Another question that should be investigated in a future work is the ability of 
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the proposed framework to exploit inter-scale dependencies in addition to spatial and inter-component 
ones, as considered in [21] for the mono-channel case. In order to obtain an interscale denoising method, 
an appropriate ROV should be defined and the interscale statistics of the noise should be available. 
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